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Abstract 

In photography, low depth of field (DOF) is an im- 
portant technique to emphasize the object of interest 
(001) within an image. Thus, low DOF images are 
widely used in the application area of macro, portrait 
or sports photography When viewing a low DOF 
image, the viewer implicitly concentrates on the re- 
gions that are sharper regions of the image and thus 
segments the image into regions of interest and non 
regions of interest which has a major impact on the 
perception of the image. Thus, a robust algorithm for 
the fully automatic detection of the 001 in low DOF 
images provides valuable information for subsequent 
image processing and image retrieval. In this paper 
we propose a robust and parameterless algorithm for 
the fully automatic segmentation of low DOF images. 
We compare our method with three similar methods 
and show the superior robustness even though our 
algorithm does not require any parameters to be set 
by hand. The experiments are conducted on a real 
world data set with high and low DOF images. 

1 Introduction 

In photography, low depth of field (DOF) is an im- 
portant technique to emphasize the object of interest 
(001) within an image. Low DOF images are usually 
characterized by a certain region which is displayed 
very sharp like the face of a person and blurry image 



regions which are significantly before of behind the 
object of interest. 

Low DOF images are well known from sports, por- 
trait or macro photography where only a specific part 
of the image should attract most of the users' atten- 
tion. The 001 is thereby displayed sharp while other 
areas like the background appears blurred, so that 
the viewer automatically focuses on the sharp areas 
of the image. When viewing a low depth of field 
image, the viewer implicitly segments the image into 
regions of interest and regions of less interest (usually 
background). As this implicit segmentation has ma- 
jor impact on the perception of the image, this infor- 
mation is a valuable feature for the subsequent image 
processing chain like an adaptive image compression 
[6j or image retrieval aspects such as the similarity 
of images which can be considerably infiuenced by 
the image's DOF. Given for example two images dis- 
playing a person in the sharp image region in front 
of different, blurred backgrounds, people might judge 
both pictures similar even though the blurred back- 
ground differs. Although this implicit segmentation 
is rather easy for a human viewer of the photo, it is 
not an easy task for a completely unsupervised algo- 
rithm. This can be explained by the fact that there is 
usually not a sharp edge which divides the sharp 001 
and blurred background. Depending on the camera's 
setting, this transition can be very smooth so that it 
is hard to distinguish where the 001 ends. 

With the vastly growing market of consumer 



DSLRs or even new small compact cameras like the 
Sony Cybershot which are explicitly being advertised 
with the ability for low DOF photos, the amount of 
low DOF photos also increased. This growing amount 
of low DOF images may also provide new information 
for established search and retrieval systems if they 
take the 001 into account when performing the sim- 
ilarity search tasks. In order to profit from the low 
DOF information, search engines and feature extrac- 
tion algorithms need fully automatic and robust im- 
age segmentation algorithms which can separate the 

001 from the rest of the image. For large search en- 
gines or image stock agencies, such algorithms should 
also be independent of the image domain, image size 
and the color depth of the image so that the algorithm 
performs well, no matter of the color or the toning of 
the photo (e.g. black-and-white, color photos, sepia 
photos). 

In this paper, we propose a robust, fully automatic 
and parameterless algorithm for the segmentation of 
low DOF images as well as an analysis of the im- 
pact of low DOF information on similarity search. 
The algorithm does not need any a priory knowledge 
like image domain or camera settings. The algorithm 
also provides meaningfull results even if the DOF is 
rather large so that the background provides signif- 
icant structures. The rest of the paper is organized 
as follows: In Sec. [2] we review some related work 
and some technical background, follwed by the ex- 
planation of the algorithm in Sec. [3] In Sec. |4]we 
explain our experimental evaluation of the algorithm. 
In Sec. [5] we describe the internal parameter settings 
and threshold values. The impact of the DOF seg- 
mentation to image similarity is shown in Sec. |6] 
Afterwards we finish the paper with a conclusion and 
outlook in Sec. O 

2 Related work 

The segmentation of low DOF images has gained 
some interest in the research community in past 
years. In fi2\ [T4 J early approaches to segment low 
DOF images were presented. Thereby [T?] is using an 
edge-based algorithm which first converts a gray-scale 
image into an edge-representation which is then fil- 



tered. Afterwards the edges are linked to form closed 
boundaries. These boundaries are treated with a re- 
gion filling process, generating the final result. [12] 
presents a fully automatic segmentation algorithm 
using block based multiresolution wavelet transfor- 
mations on gray scale images. Even though the pa- 
per lists high rates of sensitivity and specificity on the 
testset, the authors also name some limitations like 
the dependence on very low DOF, fully focused 001, 
and high image resolution and quality. In ^i6\ [T5] 
high frequency wavelets are used to determine the 
segmentation of low DOF images. As stated in [S], 
these features have the drawback of being not too ro- 
bust if used alone and thus often result in errors in 
both focused and defocused regions if the defocused 
background shows some busy textures or if the fo- 
cused foreground does not have very significant tex- 
tures. In [lOj, localized blind deconvolution is pro- 
posed to determine the focus map of an image. Yet 
the authors do not propose a pure image segmenta- 
tion algorithm as the focus map is not a true seg- 
mentation but a measure for the amount of focus in 
this part of the image. Also the algorithm does not 
take into account any color information as it is only 
operating on gray scale images. The works proposed 
in |9l |13l [71 [8] are consecutive works for segmentation 
of DOF images and sequences of images |9l [13] like 
in movies which address a similar topic. In this pa- 
per, we were inspired by the algorithm proposed in [7J 
which uses morphological filtering for the segmenta- 
tion. Some problems of this algorithm were given by 
background that showed significant structures or if a 
photo was taken with high ISO values. Also the al- 
gorithm showed some problems if spatially separated 
OOIs were shown in a single image. Another problem 
can be raised by the size of the structuring element 
used in the algorithm [7, Sec. IV]. 

We compare our algorithm with the work of [7J, 
with [11] where single frames of videos are processed 
into a saliency map which is processed by morpho- 
logical filters. The resulting tri-map is then used for 
error control and for the extraction of boundaries of 
the focused regions. We also compare our work to the 
algorithm proposed in [ISJ, where a fuzzy segmenta- 
tion approach was proposed by first separating the 
image into regions using a mean shift. These regions 



2 



are then characterized by color features and wavelet 
modulus maxima edge point densities. Finally, the 
region of interest and the background are separated 
by defuzzification on fuzzy sets generated in the pre- 
vious step. Our test image dataset consists of a set of 
various photos and comprises several categories from 
high to low DOF images. 

2.1 Depth of Field 

In optics, the DOF denotes the depth of the sharp 
area around the focal point of a lens seen from the 
photographer. Technically, each lens can only focus 
at a certain distance at a time. This distance builds 
the focal plane which is orthogonal to the photog- 
raphers view through the lens. Precisely, only ob- 
jects directly on the focal plane are absolutely sharp, 
while objects before or behind the focal plane are dis- 
played unsharp. With increasing distance from the 
focal plane, the sharpness of the displayed object de- 
creases. Nevertheless, there is a certain range before 
and behind the focal plane where objects are recog- 
nized as sharp until a blur is perceived. The depth of 
this region is then called the DOF. As the sharpness 
decreases gradually with increasing distance from the 
focal plane, it is hard to determine an exact range for 
the DOF as the limits of the sharp area are only de- 
fined by the perceived sharpness. 

Points in the defocused areas appear blurred to a 
certain degree. This is often modeled by a Gaussian 
kernel as in Eq. [T] where a denotes the spread 
parameter which affects the strength of the blur. For 
a given image /, the blurred representation can then 
be created by a convolution * /. 

G.(^,y) = ^exp(-^i^) (1) 

The effect of DOF is mainly determined by the 
choice of the camera respectively its imaging sensor 
size, aperture and distance to the focussed object. 
The larger the sensor or aperture, the smaller the 
DOF. Increasing the distance from the camera to the 
focussed object will also expand the resulting DOF. 
Figure [T] illustrates the geometry of DOF at a sym- 
metrical lens. 




Fig. 1: Figure illustrating the depth of field. The size 
of the perceived sharp area around the focal plane 
denotes the DOF. 

2.2 Automatic segmentation of low 
DOF images 

Automatic segmentation of images is more challeng- 
ing than interactive approaches because no additional 
information of humans can be used to adapt parame- 
ter values for the segmentation process. However the 
advantages of a fully automated algorithm are obvi- 
ous, if the according algorithm should be deployed to 
a system providing lots of images where the segmen- 
tation should be present as fast as possible. This is 
for example the case in search index or photo com- 
munities like Flickr or Google's Picasa, where several 
thousand photos are uploaded each minute, even if 
not all of them are low DOF images. 

The requirements to a segmentation algorithm are 
that it should be able to handle different types 
(grayscale or color), orientations (landscape or por- 
trait) and resolutions (from small to large) of im- 
ages, independent of the camera settings like ISO 
etc. Many automatic segmentation approaches of low 
DOF images have some of these restrictions, as seen 
in [18j, which only performs well on color images. 
Grayscale images mostly fail because the extracted 
color features are too few, to characterize regions and 
distinguish them sufficient. 

However, other algorithms like the one presented in 
[7J, can only process grayscale images. In such cases, 
color images have to be transformed and hence their 
color information looses its contribution to improve 
segmentation quality. As shown in our experimental 
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results in section[4j images that consist of complex de- 
focused regions can cause poor segmentation results, 
because too many false positives are found. In this 
context, false positives describe the set of pixels that 
are defined as background by the underlying ground 
truth but classified as 001 pixel by the segmentation 
algorithm. 

In the following section, we describe our algorithm 
that does not suffer from one of the restrictions men- 
tioned above. Therefore, we use a robust method 
for calculating the amount of sharpness of a pixel 
in relation to its neighbors by taking advantage of 
the L*a*6* color space, which offers a more accurate 
matching between numerical and visual perception 
differences between colors. The L*a*6* model was 
favored over the well known RGB and CMYK color 
spaces, as the L'^a^b* model is designed to approxi- 
mate human vision better than the other color spaces. 

To accomplish the problems caused by images con- 
sisting of numerous less blurred pixel regions showing 
complex structures, we apply a density-based cluster- 
ing algorithm to all found sharp pixels. This enables 
our algorithm to distinguish between sharp pixels be- 
longing to the main focus region of the 001 (if these 
pixels belong to the largest found cluster) and noise 
pixels located in background structures. 



3 Algorithm 

The proposed algorithm consists of the following 
five stages: Deviation Scoring^ Score Clustering^ 
Mask Approximation, Color Segmentation and Re- 
gion Scoring. Before explaining the steps of the algo- 
rithm in detail, we first want to give a brief summary 
of the complete algorithm. Fig. |2] illustrates the steps 
of the algorithm. 

The first stage of the algorithm, called Deviation 
Scoring, identifies sharp pixel areas in the image. 
Therefore a Gaussian Blur is applied to the original 
image. The difference between the extracted edges 
from the original image and the blurred image is 
then calculated. For each pixel, this difference repre- 
sents a score value, with higher score values indicat- 
ing sharper pixels and lower score values indicating 
blurred pixels. 



In the second stage, called Score Clustering, all pix- 
els with a score value above a certain threshold are 
clustered by using a density-based clustering algo- 
rithm. Thus, isolated sharp pixels are recognized as 
noise and only large clusters are processed further. 

The third stage named Mask Approximation gen- 
erates a nearly closed plane (containing almost no 
holes) from the discrete points of each remaining clus- 
ter. This is achieved by computing the convex hull 
from all neighbors of all dense pixels. Any so-created 
polygon is then filled and the union of these filled 
regions represent the approximate mask of the main 
focus region. In the next two stages this approximate 
mask is going to be refined. 

Hence, the fourth stage, called Color Segmentation 
divides the approximate mask into regions that con- 
tain pixels with similar color in the original image. 

In the fifth stage named Region Scoring, a rele- 
vance value is calculated for each region. This rele- 
vance value is directly influenced by the score values 
of the pixels surrounding the according region. The 
final segmentation mask is then created by removing 
all regions that have a relevance value below a certain 
threshold. 



3.1 Deviation Scoring 

In the first step, we need to identify sharp pixels as 
an indication for the focused objects within the im- 
age. The well known Canny edge detector [2j is not 
suitable in this case because the Canny detector op- 
erates on gray scale images and not on the L*a*6* 
color space. Furthermore, the Canny operator does 
not aim at the detection of single edge pixels but at 
the robust detection of lines of edges even in partly 



blurred areas of the image (c.f. Fig. 3b on page 6) 
The HOS map used in [7J is defined as in Eq. [2] 



HOS {x, y) = min ( 255 



rh^^^ {x, y) 
DBF 



(2) 



where DBF represents a down scaling factor of 100 
and the forth-order moment m^^^ at {x, y) is given by 
m^^^ {x, y) = ^ E {^^ t)-m {x, y)f where 



{s,t)eiri{x,y) 
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(a) Input Image 



(b) Deviation Scor- (c) Score Clustering (d) Mask Approxi- (e) Color Segmenta- 
ing mation tion 



(f) Region Scoring 



Fig. 2: Illustration of the five stages of our algorithm: Fig. 2a Input image with low DOF and relatively 
complex background regions. Fig.|2b[ Identify sharp pixels by computing the difference between the edges of 
the original image and the edges of the blurred version of the image. Fig. [2cj Generate clusters from pixels 
with a high appropriate score by a density-based clustering algorithm (for a better visual representation we 
colored each found cluster and surround it with its convex hull). Fig. |2d| Filling all convex hulls from all 
neighbors of all dense pixels. Fig. [2ej Group pixels gito regions that contain similar colored pixels in the 
original image. (For a better visual representation we colored each found region in a random color). Fig. [2f| 
Removing all color Regions with low relevancy. 



rh is the sample mean and defined as in Eq. |3] 



TO {x, y) 



(3) 



{s,t)er]ix,y) 



Thereby rjix^y) is tiie set of neigiiboriiood pixels witii 
center (x, y) and is set to size 3x3 wiiere Nr^ denotes 
its cardinality. Using the HOS map also has the dis- 
advantage that it operates only on gray scale images. 
Additionally, the HOS map is too sensitive in case of 
textured background as it only produces reasonable 
results if the background is significantly blurred. This 
works for images with very low DOF, but as soon as 
the DOF is not very small, the HOS map detects too 



many sharp areas in the background (c.f. Fig. 3c). 

Thus, we propose the process of Deviation Scoring. 
Let / be the set of pixels of the processed image. For 
each pixel p{x^y) e I, the mean color from the pixel's 
r-neighborhood is calculated by 

V'iix,y) = {P (^'. y) \ \x - x\<rA \y' - y\ < r} 

with r representing the Ll-distance to the pixel 
p{x^y). The color value of p{x^y) is represented in 
the L^'a^'b* color space and denoted by (L*, 
Thus, the mean neighborhood color of p{x, y) in the 
I/*-band is determined by 



The values for the a*- and 6* -band are denoted by 



and br. 



'll{x,y) 



respectively, so that the mean 
neighborhood color Labrj^^^^ ^^of a pixel p{x^y) is de- 
fined by ( L„r bri^ ) . According to the 

\ 'I(x,y) 'I(x,y) 'I(x,y) J 

International Commission on Illumination ClfQ the 
color distance AE^ {u^ v) between two color values 
V in the L*a*b* color space is calculated by using the 
Euclidean distance: 

AE* {u, v) = {LI - L*f + {at, - aif + {bl - b*f 

^CIE: Commission Internationale de I'eclairage, 
[http://www.cie. CO. at| 






(a) Original image 



(b) Canny Edge Detection 





(c) Higher Order Statistics (d) Deviation Scoring 

Fig. 3: Comparison of edge detection techniques. 
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(a) HSV (0, 1, 1) increased in component H. 



(b) HSV (0, 0, 1) increased in component S. 



(c) HSV (0, 0, 0) increased in component V. 

Fig. 4: Visualization of the A£^* color distance within 
each component of the well known HSV color space, 
with colors cq, . . . , cg. Where q of the i-th square is 



increased in each of the components H (Fig. 4a), 5* 
(Fig.|4b| and V (Fig.|4c| so that A^;* ^ 16. 



For each p{x^y), the neighbor difference Ar^^^ is 
then defined by 



, 255 



with AE^^^ being the maximum possible distance in 
the L*a*h* color space and {u,v) being the Eu- 
clidean distance of the color values v in the L*a*h* 
space. In Fig. |4] we illustrated a color distance of 
A^* = 16 , by varying one of the three components 
of a base color defined in the HSV color space . 

In the following, all pixels p{x^y) with a neighbor 
difference greater than the threshold Qscore e [0, 255] 
are called edges or edge pixels, so that the equation 
^Vu^ > Bscore holds for each edge pixel of the 
image. Even though the parameter Qscore could be 
set freely, we recommend a value of 50 (c.f. Tab. [T]) 
as it showed the best result. Before calculating the 
score values of the edge pixels, / is convolved using 
a Gaussian kernel with a standard deviation a = &a- 
to remove noise and generally soften the image. We 
recommend to set the value of Q^r to ^ (c.f. Tab.jl]). 
The resulting image is then denoted by /^ 

Afterwards, another image is created by con- 
volving r once again by using the same Gaussian 
kernel, /'and /^are then used to compute the score 



values /i G [0, 255] of the edge pixels. Therefore, 
the score /i(x, y) for an edge pixel p (x, y) is deter- 
mined by the squared neighbor difference in the im- 
ages /'and /^ at the location of the according pixel: 



/i (x, y) = min 



|255, (Ar?;,(,,^)-Ar?;,(,,^))'| 



Due to the limitation to /J^{x,y) < 255, we are treat- 
ing all color changes between I'{x,y) and I{x,y) 
equally where AE > 16. This can be justified by 
human perception, which recognizes two colors v 
to as rather unsimilar to each other if A£^*(i^, v) > 12 
[3J. Thus it can be said, that a A£^* > 16 indicates 
a significant color change which is also a strong indi- 
cation for an edge. 

Afterwards, all edge pixels with a score value 
greater than the threshold Qscore are treated as can- 
didates for the focused region of the image while the 
score values of all pixels having a score value less than 
©score are set to and are thus no candidates. The 
resulting candidate set Iscore{x-,y)i is defined by the 
following equation: 




li{x,y) < Qs 
else 



An illustration of the candidate set can be seen in 



Fig. [3d| and Fig. |2bj where brighter pixels indicate a 
large score and black pixels indicate a score less than 

the threshold Qscore- 

3.2 Score Clustering 

In this step, clusters are generated from all points in 
Iscore in order to find compound regions of focused 
areas. Therefore, the density-based clustering algo- 
rithm DBSCAN [4J is used. In contrast to the K- 
Means [5j, which partitions the image into convex 
clusters, DBSCAN also supports concave structures 
which is more desirable in this case. The following 
section gives a short outline of DBSCAN and then de- 
scribes how the necessary parameters e and minPts 
are determined automatically and how DBSCAN is 
used in our segmentation algorithm for further pro- 
cessing. 
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3.2.1 DBSCAN 

In this stage, clusters are generated from all p G Iscore 
by applying DBSCAN, which is based on the two pa- 
rameters s and minPts. The main idea of this clus- 
tering algorithm is that each point in a cluster is lo- 
cated in a dense neighborhood of other pixels belong- 
ing to the same cluster. The area in which the neigh- 
bors must be located is called the £ -neighborhood of a 
point, denoted by (p), which is defined as follows: 



N,{p) = {qeD\dist{p,q)<e} 



(4) 



where D is the database of points and dist (p, q) de- 
scribes the distance measure (e.g. the Euclidean dis- 
tance) between two points p^q G D. 

For each point p of a cluster C there must exist a 
point G C so that p is within the e -neighborhood of 
q and (p) includes at least minPts points. There- 
fore some definitions are invoked which are described 
in the following. Considering e and minPts, a point p 
is called directly density-reachable from another point 
q/ifp e (p) and p is a so-called core-point. A point 
p defined as a core-point if \N^ {p)\ > MinPts holds. 
If there exists a chain of n points pi, . . . , p^, such 
that Pi-^i is directly density-reachable from p^, then 
Pn is called density-reachable from pi. Two points p 
and q are density- connected if there is a point o from 
which p and q are both density reachable, considering 
e and minPts. 

Now, a cluster can be defined as a non-empty sub- 
set of the Database so that for each p and the 
following two conditions hold.* 

• Vp, q : if p G C and q is density-reachable from 
p, then q G C 

• Vp, q e C : p is density-connected to q 

Points that do not belong to any cluster are treated 
as noise = {p G D \ \/i : p ^ C^}, where i = 1, . . . , /c 
and Ci, . . . , Ck are the found clusters in D. 

3.2.2 Determination of Parameters 

To provide highest fiexibility with respect to the dif- 
ferent occurrences of the focused area, we do not ap- 
ply absolute values for e and minPts, but compute 



them relatively to the size of the image and its score 
distribution. Thus, s is calculated by e = • 6^, 
with |/| denoting the total amount of pixels of the 
image represented by / and G [0, 1]. The second 
parameter minPts is determined by 



minPts 




mm < — 

I ^dbsc 



with the threshold Qdbscan set to 255. 

The result of the DBSCAN clustering is a cluster 
set C = {ci, . . . , Cn}, with each q G C representing 
a subset of pixels p{x, y) G /. Due to our assumption 
that small isolated sharp areas are treated as noise, 
we define the relevant score cluster set 



C 



\ceC 



> 



maxc 



with maxc = max {|ci | , . . . , |cn|} being the amount 
of pixels of the largest cluster. An illustration of this 



step can be seen in Fig. |2c on page 5| where different 
clusters are painted in different colors. 

3.3 Mask Approximation 

The relevant score cluster set C, as defined in section 
|3.2[ is already a good reference point of the OOI's 
location and distribution. In general however, there 
exists no single contiguous area, but several individ- 
ual regions of interest representing the focused ob- 
jects. This stage of the algorithm connects all clusters 
c G C to a contiguous area which represents an ap- 
proximate binary mask of the 001. This is achieved 
through the two steps Convex Hull Linking and Mor- 
phological Filtering, that will be described in more 
detail below. 

3.3.1 Convex Hull Linking 

In the convex hull linking step we first generate 
the convex hull for all points in the £ -neighborhood 
Neps ip) of each core point p of the cluster set. 
Let K = {ki, . . . ,kj} be the set of all core points 
from the score clusters in C and let convex{P) be 
the convex hull of a point set P. Then we can 
define the set of convex hull polygons by = 
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{convex{Neps (/ci)), . . . , convex{Neps {kj))} which is 
used to generate a contiguous area. Therefore each 
p{u,v) e I is checked, if it is located within one of 
the convex hull polygons of i^. If that is the case, we 
mark this pixel with 1, otherwise with 0. The binary 
approximation mask lapp is then given by 



dilation. The geodesic erosion e^^^ and dilation 5'^'^^ 
of infinite size, called reconstruction h 
and reconstruction by dilation ^^^^^\ 
as follows 



y erosion Lp^'^^^^ 
is then defined 



(rec) 



o e 



(1) 



if 3Hi : p{x, y) G Hi 
otherwise 



Afterwards we apply the morphological filter oper- 
ations closing and dilation by reconstruction to lapp 
for smoothing and closing small holes. 

3.3.2 Morphological Filtering 

Morphological filters are based on the two primary 
operations dilation 5h {I) and erosion sh {I) where 
H{i,j) G {0,1} denoting the structuring element. 
For a binary image /, Sh {!) and en (!) are defined 
as in the following equations: 

Sh (!) = {{s,t) = {u^i,v^ j) I (5, t) e /, (z, j) e H} 



^(rec) ^j^ j,^ ^ ^(oo) j,^ ^ ^(1) ^ ^ ^ ^ ^ ^(1) j,^ 

Note that Lp^^^^^ {'.^') and 7*^^^^^ (•, •) converge and 
achieve stability after a certain number of iterations. 
Thus it is assured that these functions do not need 
to be executed indefinitely and so the application is 
guaranteed to terminate. 



3.3.3 Application 

In our approach, we primarily apply a morphological 
closing operation ipn {lapp) = {Sh {lapp)) to the 
approximate mask. The dimension of the structuring 
element H therefore is discussed later. Afterwards we 
use (f^^^ {lappi Sh' {lapp)) to closc holcs in the approx- 
imate mask lapp- The dimension of the structuring 
element H is h x h^ where h is calculated relatively 



£h{I) = {{s,t) I + + e I,\/{iJ) e H} to the total pixel count \Ia 



The operation morphological closing (fn^ is a com- 
position from the two primary operations, so that 
{I) = £h {Sh {I))' Thus, the input image / is 
initially dilated and subsequently eroded, both times 
with the same structuring element H. In order to de- 
fine the following operation dilation by reconstruc- 
tion, some more definitions are required. At first, the 
primary operations Sh {I) and Sh {I) are extended 
to the basic geodesic dilation S^^^ (/, I') and basic 
geodesic erosion e'^^^ (/, I') of size one as in the fol- 
lowing equations. 

(li, v) = min {Sh {I) {u^ v) , P {u^ v)} 

e^^^ (/, I') {u, v) = max {sh {I) {u, v) , {u, v)} 

Note that these basic geodesic operations need an 
additional Image I' , which is called marker, where 
the input image / is called mask. Thus, the result of 
a geodesic erosion at position {u^ v) is the maximum 
value of the erosion Sh of mask / and the value of the 
marker image r {u, v) and vice versa for the geodesic 



of the image j-app, 



^ app J SO 

After this 



that h = ^J\Iapp\ • ©rec, with 6rec ^ [0, 1] 

morphological processing, the approximate mask lapp 
covers the 001 quite well (c.f. Fig. 2d on page 5). 



In general however, it includes boundary regions that 
exceed the borders of the 001 and tend to surround it 
with a thick border. The following two stages of our 
algorithm refine the mask by erasing the surrounding 
border regions. 

3.4 Color Segmentation 

In this stage, the pixels from the approximate mask 
lapp are divided into groups, so that each group con- 
tains pixels that correspond to similar colors in /. 
Therefore we process each p {u^ v) G lapp and itera- 
tively include all its neighbors n for which the follow- 
ing conditions hold: 



n G 



{{s^t) e v],^^iu,v) I hpp{s,t) =Iapp{u,v) = l| 
A A£;*(p,n) < Gdist- (5) 
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The threshold Qdist ^ [0, 100] is an internal param- 
eter, which specifies the maximum distance between 
two color values u^v in the L^'a^h* color space. 

Therefore a method expand{x^ R) is called for 
each e {{x,y) e hpp \ hpp {x,y) = 1}, which 

is not yet marked as visited. R = {{x^y)} here defines 
a new color region formed by the point pi (x, y). The 
method expand{x^ y, R) then proceeds as follows: For 



all neighbors P2(^, y) of p{x^ y) fulfilling Eq. 5 on the 



preceding page we add p2 to R and mark p{u^v) as 
visited. Then expand (u^ v^R) is called recursively. 
The resulting set of regions is called Rcoior- 

3.5 Region Scoring 

In this step, a relevance value /i is calculated for each 
region r G Rcoior- The more a region is surrounded 
by areas with a large score /i, the larger the relevancy 
value gets. Low relevant regions are removed after- 
wards which causes an update of /i in the neighboring 
regions and thus possibly triggers another deletion if 
the relevance of an updated region is not high enough 
after the according update. 

3.5.1 Boundary Overlap 

The boundary overlap BO^ of a region r is a measure 
for the adjacency of r to the approximate mask I app 
and is defined as 

50^ = \{{u,v) e Br I 3r' eR: {u,v) e r}\ , 

where Br is the difference of r to its dilation. The 
mask boundary overlap MBOr of r is then defined 



as MBOr 



BOr^ 



\Br\ 



-. MBOr specifies the ratio of 



the number of outline points located in other regions 
to the number of all outline points of r. 



The score boundary overlap SBOr 



BO 



\Br\ 



^ of r is a 



measure for the adjacency of r to the corresponding 
score values /i. A large SBOr indicates, that r has a 
neighborhood with large corresponding score values 
/i. 

3.5.2 Mask Relevance 

The mask relevance for a given region r can then be 
defined as MRr = SBOr • MBOr. Afterwards, we 



eliminate all regions r with a mask relevance value 
which is too low. The calculation of MRr is exe- 
cuted iteratively: Let MR^ denote the value MRr 
of a region r during the i-th iteration. One iter- 
ation cycle computes the corresponding /i for each 
region r and deletes r from the approximate mask 
lapp if MRl < Orel' The precise assignment of 
Qrei and its impact on segmentation quality is dis- 
cussed later. Once a region r satisfies MR^ < Qrei 
at iteration it will be erased from lapp^ so that 
V (x, ?/) G r : lapp {x^y) = 0. The calculation of MR^ 
continues for i = l,...m iterations and terminates 
as soon as there are no more regions to delete. This 
is the case, as soon as MR^ = MR\r^ such that 
3m > 1 I Vr G Rcoior - MR^ = MR'^. 



4 Experimental Results 

The proposed algorithm is designed to be parameter- 
less and thus applicable to different types of images 
without having to adjust parameter values by hand. 
The quality of the resulting segmentation only de- 
pends on the size and resolution of the input image. 
In this section we discuss the quality measure for the 
comparison of different segmentation algorithms and 
we show key features, such as the amount of depth 
of field, that affects difficulties in segmentation. Fur- 
ther more, we demonstrate that images with higher 
resolution generally lead to be better segmentation 
results in contrast to the reference algorithms, which 
loose accuracy with growing size of the processed im- 
age. 

In |4.3| we describe the internal parameters and 
threshold values that we determined during our de- 
velopment and testing phases and show their impact 
on the quality of the segmentation result and the per- 
formance. 



4.1 Quality measure 

To determine the quality of a segmentation mask / 
we use the spatial distortion d' (I^Ir) as proposed in 
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Table 1: Parameters used in the algorithm. 



I {x,y) Ir{x,y) 

ix,y) 



where (g) is the binary XOR operation and Ir is the 
manually generated reference mask that represents 
the ground truth. The spatial distortion denotes the 
occurred errors, false negatives and false positives, in 
relation to the size of the reference mask which is 
equivalent to the sum of all true positives and false 
negatives. Notice that can grow larger than 1 if 
more pixels are misclassified than the number of fore- 
ground pixels in total. Let i/,oe be a blank mask so 
that i^oe(x,7/)=0 for each pixel p{x^y). The num- 
ber of false negatives is now equal to the foreground 
mask pixel in / because no pixel in i'/^oe is defined as 
forground. For each image /, the spatial distortion 
d (i'^^oe, I) = 1 . Thus we limit d' to d G [0, 1], so that 
d (/, Ir) = min {1, d' (/, Ir)} . 

4.2 Dataset 

All experiments were conducted on a diverse dataset 
of 65 images downloaded from Flickr and created by 
our own. The images are from different categories 
with strong variations in the amount of depth of field, 
as well as in the fuzziness of the background. Also 
the selection of the images does not focus on cer- 
tain sceneries, topics or coloring schemes in order to 
avoid overfitting to certain types of images. In our ex- 
periments, we compare the spatial distortion of the 
proposed algorithm with re-implementations of the 
works presented in [7] , [11] and J8] . The parameters 
for all algorithms were optimized to achieve the best 
average spatial distortion over the complete test set. 

4.3 Comparison 

A major contribution of this algorithm is that none 
of the parameters introduced in the previous section 
needs to be hand tuned for an image as all parameters 
are either independent of the image or determined 
fully automatically. An overview of the implicit pa- 
rameters and their default values can be seen in Ta- 
bled 



Parameter Value 



Description 



^score 

Qe 

Qa 

^dist 
Qrec 
^rel 



50 

1 

10 

25 
1 



Score /i threshold of Iscore 
Spatial radius of DBSCAN 
Gaussian blur radius 
Color similarity distance 
Relative size of reconstruction 
Minimum value of relevant regions 



To calculate the score image Iscore we use a Gaus- 
sian blur with standard deviation of Oct = jq- Iscore 
is then scaled to fit in 400 x 400 pixel to improve 
the processing speed of the subsequent steps with- 
out major impact to accuracy. We set O score = 50 
so that a score ja must exceed 50 to be processed by 
the density-based clustering algorithm DBSCAN that 
uses a neighborhood distance e = Qe\/W\^ where |/| 
is the total pixel count of image / and minPts is cal- 



culated in dependence of £, as described in Sec. 3.2.2 



To smooth the approximate mask we use the mor- 
phological operation reconstruction by dilation 7^^^ 
with a structuring element H of size ^/\Iapprox \ • ©rec 
where Brec = |- Further we set the maximum dis- 
tance threshold of similar color values in the L*a*6* 
color space to Qdist — 25. The refinement of the 
approximate mask removes all regions with a mask 
relevance value less than Qrei = §• 

Fig. |6] compares the performance of the reference 
algorithms with our proposed method. It can be seen 
that even though the computation time of the pro- 
posed algorithm is greater than two of the three ref- 
erence algorithms, it outperforms the reference algo- 
rithms in terms of spatial distortion in all cases. Also, 
our algorithm has an average spatial distortion error 
of 0.21 over the complete test set which is less than 
half compared to the best competing algorithm with 
an average of 0.51 for the morphological segmentation 
with region merging [7J. Our algorithm also provides 
the lowest minimum error of 0.01 in contrast to 0.02 
for the fuzzy segmentation algorithm [18j. 

It should also be noted that in contrast to the ref- 
erence algorithms, the proposed algorithm shows im- 
proved accuracy with larger images whereas the com- 
petitors loose accuracy with growing size of the im- 
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Table 2: Spatial distortion and run time of the 
proposed algorithm compared to the reference algo- 
rithms. 





Proposed 


[7\ 


[18J 


[llj 


Minimum 


0.01 


0.05 


0.02 


0.07 


Median 


0.10 


0.48 


0.93 


1 


Average 


0.21 


0.51 


0.84 


0.81 


Std.Dev. 


0.21 


0.26 


0.25 


0.31 


Time 


28s 


9s 


54.2s 


2.7s 



1,00 
0,90 
0,80 
0,70 
0,60 
0,50 
0,40 
0,30 
0,20 
0,10 
0,00 
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Fig. 5: Spatial distortion error values of the segmen- 
tation with our proposed algorithm for each of the 65 
dataset images. 



age. The minimum, median, average and standard 
deviation of the spatial distortion error are listed in 
Tab. El 

In Fig. [5] we illustrate all spatial distortion error 
values for each segmented image in the dataset. 



4.4 Image Types 

For each image we can define key features as in table 
[3j All these features affect the difficulty for an a accu- 



Table 3: Key features of an Image 



amount of DOF 


defocused regions 


color 


small 


homogeneous 


plain 


high 


complex 


variant 



rate segmentation. Images with smaller DOF, homo- 
geneous defocused regions and variant colors tend to 
produce much better segmentation results than im- 
ages with high DOF, complex defocused regions and 
plain colors. The first type of images is commonly 
used by most of the competitors' publications to en- 
sure a high segmentation quality of the presented al- 
gorithm. Figure [7] shows the minor segmentation dif- 
ferences of a small DOF image. A much more chal- 
lenging task is the segmentation of images with com- 
plex background as shown in Fig.jSj Thereby our pro- 
posed algorithm achieves a good segmentation result 
with a spatial distortion of 0.21 (c.f. Fig. [8b|), where 
the other segmentation algorithms [18J, [7J and |TT] 
fail with spatial distortion values of each larger than 



0.69 (c.f. Fig. 8c, 8d and 8e). 



4.5 Size of input image 

One of the most influential variables on segmentation 
quality is the resolution of the input image. A com- 
paratively high resolution is needed for a proper seg- 
mentation, if for example an image has just a slightly 
defocused background and thus shows significant tex- 
ture. Thus we designed the algorithm to be able to 
handle a large scope of resolutions properly with- 
out loss of quality. By using the image of Fig. |8a| 
as input, a spatial distortion value of 0.65 can al- 
ready be achieved at the relatively small resolution 
of 200 X 300 pixels. For this particular image, a res- 
olution of 240 X 350 is needed to lower the spatial 
distortion to 0.42. Fig. [9] shows the diversification of 
average and standard deviation of the spatial distor- 
tion depending on the increase of image size. Because 
the orientation (portrait or landscape) and aspect ra- 
tio (2:3, 4:3, etc.) of the images in our test set varies, 
we rescale each image so that its longest side equals 
the value of the resizing operation. In the following 
we denote the term image size by the longer side of 
the image. 

As we increase the input image size from 100 pixels 
to 200 pixels, we can lower the average spatial distor- 
tion from 0.74 to 0.5 and the standard deviation of 
the spatial distortion from 0.84 to 0.43. As the size of 
the images reach 600 pixels, the average and standard 
deviation of the spatial distortion are 0.36 and 0.25, 
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(a) Original image. (b) Proposed algorithm, (c) Result of fuzzy seg- (d) Result of morpholog- (e) Result of video seg- 
mentation [18] ical segmentation [7 mentation [11] 

Fig. 6: Comparing the results of the different algorithms, where the input image has complex defocused 
regions and small DOF. 



(a) Original with ho- (b) Proposed algorithm, (c) Result of fuzzy seg- (d) Result of morpholog- (e) Result of video seg- 

mogeneous defocused re- mentation [T8] ical segmentation [7 mentation 

gions. 

Fig. 7: Comparing the results of the different algorithms, where the input image has homogeneous defocused 
regions, small DOF and variant colors and thus represents an easy task for all algorithms. 



(a) 



(b) 




Fig. 8: Results of different segmentation algorithms applied to an image with complex background (Fig. 8a). 
The spatial distortions of the applied algorithms are 0.21 for our proposed algorithm (Fig.[8b)), 1.0 by applying 



[T5] (Fig. 8c), 0.69 by applying [7J (Fig. 8d) and 1.0 by applying [llj (Fig. 8e). 
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5 Parameter Settings 

In this section, we describe the internal parameter 
settings and threshold values. The choice of the pa- 
rameter values is the result of an extensive evaluation 
and represents a recommended set of values that pro- 
duced the best and stable results. Thus, none of these 
values has to be set by the user. For each parameter 
we discuss the impact on the average and standard 
deviation of the spatial distortion error and on the 
performance by evaluating the segmentation on our 
data set. 



Fig. 9: Impact of the size of the input image to the er- 
ror rate of the segmentation measured by the spatial 
distortion and the average runtime per image. 



respectively. At the same time, the runtime increases 
from an average of 686 milliseconds per image at 100 
pixels, to 4155 milliseconds at 300 pixels. In our ex- 
periments we could also show a significant decrease 
of the average spatial distortion and the correspond- 
ing standard deviation up to an image size of about 
600 pixels. For images larger than 800 pixels, the av- 
erage spatial distortion and standard deviation still 
improve, yet at a significantly slower rate than in the 
case of smaller images. In Fig. |9] we summarize the 
results of this experiment. Because the score image 
Iscore was scalcd to fit in 400 x 400 pixels in the score 
clustering stage, the exponential runtime of the al- 
gorithm stagnates after reaching image sizes > 400 
pixels for its longest side. 



4.6 Sample Segmentations 

Fig. [To] presents some segmentation results of differ- 
ent color images. The input image is shown in the 
first column. In the second column our segmentation 
result is depicted. The morphological segmentation 
[7J is placed in the third column, fuzzy segmentation 
[l8j in the fourth column and the video segmentation 
[llj in the last column. Notice, that in one case the 
resulting mask covers the entire image (as seen in the 
second row of Fig. 10 in the third column). 



5.1 Blur Radius 

In the first stage, called deviation scoring, the pa- 
rameter Oct determines the size of the radius of the 
Gaussian convolution kernel to remove noise from the 
input image. The same kernel is used to re-blur the 
de- noised image to compute the deviation of the mean 
neighbor difference of each pixel in the de-noised and 
re-blurred Image. Fig. [TT] shows the distribution of 
the spatial distortion on the primary y-axis with the 
mean execution time of the algorithm in milliseconds 
being displayed on the secondary y-axis. If O^r is set 
too low, too few noise is removed from the input im- 
age. In addition, edges of the 001 in the re-blurred 
image have a reduced occurrence and thus can be de- 
termined less effective in contrast to the edges of the 



background noise. As illustrated in Fig. 11, a small 
blur radius of O^r = 0.775 results in a high average 
spatial distortion d > 0.8 compared to results of a 
segmentation with a more convenient choice of O^r- 
We assign 6cr = 0.9 as the best trade-off between 
runtime (approximately 8s per image) and an aver- 



age spatial distortion of da 



0.26. Larger values 



of Oct cause a very intense smoothing operation, so 
that the edges of the 001 loose their significance. 

5.2 Score Clustering Threshold 

The score clustering threshold Qscore describes the 
minimum score value /i, that each pixel has to exceed 
in order to be processed by the DBSCAN clustering. 
This parameter ensures that the clustering algorithm 
does not have to handle a too large number of pixels 
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Fig. 11: Impact of the blur radius O^r on the spatial 
distortion error and average runtime per image. 
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Fig. 12: Impact of the score clustering threshold 
Qscore on the spatial distortion error and the aver- 
age runtime per image. 



with a score value which is not significant. If Q score is 
set too low, DBSCAN will consume more time with- 
out improving the segmentation quality significantly. 
In cases of &score < 25 the spatial distortion grows 
even higher because too many pixels with insignifi- 
cant score values are considered. 

Otherwise if Qscore is set too high, too many pix- 
els with a potentially essential score value are not 
processed and the remaining clusters contain too few 
points. This results in a smaller mask that does not 
cover the whole focus area. As can be seen in Fig. [T2| 
©score = 50 can be defined as an optimal trade-off 
between spatial distortion and runtime. 



interpreted as noise, as shown in Fig. 13b 



As you can see in |13c[ 6^ = 0.025 produced a much 
more reliable clustering result and covers the main fo- 
cus region by not including surrounding noise. If 9^ 
is set too high, too much noise surrounding the 001 



is merged with the main cluster (see 13d). The in- 
fiuence of on the segmentation result is shown in 
figure [M) As optimum distance we set 9^ = 0.025, 



where the average and standard deviation of the spa- 
tial distortion are 0.26 and 0.18, and the average 
runtime< 9000 milliseconds per image is still accept- 
able. 



5.3 Neighborhood distance 



5.4 Size of the structuring element 



The neighborhood distance 9^ directly infiuences the 
size of the e parameter of DBSCAN, as 6 = 9^. 
A higher value of 9^ increases the spatial radius e so 
that a core point has a larger reachability distance. 
If minPts^ DBSCAN's second parameter, remains 
unchanged, an increase in 9^ would result in a de- 
creasing number of larger clusters. As minPts is de- 
fined relatively to e (see Sec. 3.2.2) an increase of 9^ 
would also enlarge minPts and vice versa. Fig. [13] 
illustrates the different clustering results when chang- 
ing this parameter. If 9^ is too low, the main focus 
area is split into many different, mostly small clus- 
ters. Thus, important information in I score would be 



The size of the structuring element H is used in our 
mask approximation stage after the convex hull link- 
ing step by morphological filter operations, in order 
to smooth the current mask. If H is too small, little 
gaps remain and prevent the following reconstruction 
by dilation operation from filling holes in the ap- 
proximate mask as only dark regions that are com- 
pletely surrounded by the white mask are treated as 
holes. As the size of H increases, more gaps are closed 
and the contour of the OOIs approximate mask gets 
more fuzzy. The operation reconstruction by dilation 
uses a structuring element H\ The size of the 
structuring element is calculated by ^/\^\ • Qrec- In 
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(a) Iscore from Lena image (b) Clustering result with (c) Clustering result with (d) Clustering result with 

= 0.005 Be = 0.025 . Be = 0.045 

Fig. 13: Impact of the 6^ parameter on score clustering. For better visual distinction, each cluster is 
represented in a random color and surrounded by its convex hull. 




(a) (b) (c) (d) 



Fig. 15: Impact of the parameter Qrec on smoothing and filling the approximate mask of the input image 
(Fig. |15a] ). Fig. |15b|15d| show the approximate masks after reconstruction by dilation with small {Qrec = 
0.1), medium {&rec = 0.25) and large {Qrec = 0.6) structuring elements. 
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Fig. 14: Impact of the neighborhood distance 6^ on 
the spatial distortion error and the average runtime 
per image. 



Fig. 16: Impact of the neighborhood distance Qrec to 
the spatial distortion error and the average runtime 
per image. 



summary it can be said that smaller sizes of H' only 
cause the filling of small holes in the mask, while 



larger sizes of fill larger holes also. Fig. 15 shows 
the operation 7^^^ with small, medium and large 
structuring elements. For &rec ^ [I? ^] the overall 
segmentation result of the complete data set is not 
highly infiuenced by Qrec as shown in Fig. [16] Values 
outside of that range produce a higher error rate. As 
Qrec approaches 1, the average runtime also increases 
rapidly from around ^ 10 seconds at Qrec = ^ to 
> 40 seconds at Qrec = ^ • 

5.5 Color similarity distance 

The color similarity parameter Qdist describes the 
distance AE* {u, v) that two colors u, v of the L*a*b* 
color space may not exceed in order to be considered 
as similar. Thus, the amount of Qdist has direct im- 
pact on the amount of color regions which are ex- 
tracted from the original images. If Qdist is set to 
a very low value, the resulting color regions are very 
small. This causes large relative mask relevance val- 
ues in case of small color regions which are overlapped 
by a small area of the approximation mask. And thus, 
less regions are removed in the following region scor- 
ing stage. Too large values for Qdist imply that too 
many regions are merged together and thus the re- 



the different results of the disjoint color regions by 
altering the parameter Qdist- Fig. |17a| is the pixel 
subset of the original image marked by the smoothed 
mask which is returned by the mask approximation 
stage (Sec. 3.3) of the algorithm. If Qdist is set to 



a small value as in Fig. 17b, a large amount of dis- 



joint color regions is created compared to Fig. |17c 
where Qdist = 25. Very few color regions are con- 
structed if a large value like Qdist = 40 is applied 
to the color segmentation stage. This is illustrated 
in Fig. |17d Usually, this makes the region scoring 
stage (Sec. 3.5) more vague. The reason for this is 
that in the case of a large Qdist it is more likely that 
n > 1 regions ri , . . . , with high score variation 



min {MRr^ , . . . , MRr^ } <C max {MRr 



.MRrJ 



gion scoring will become too vague. Fig. 17 shows 



are merged together in one larger region r = ri U 
... U Tn, where MRr-^ i = 1 . . . n denotes the mask 
relevance introduced in Sec. 13.51 The deletion of 
r would then generate more false negatives, which 
means that the resulting final mask does not cover 
the entire 001. The inclusion of r would generate 
more false positives, so that more background is fi- 
nally included. As illustrated in Fig. [Tsj an increas- 
ing value of Qdist also causes an increased processing 
time. For our tests, we chose Qdist = 25. 
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(a) (b) (c) (d) 

Fig. 17: Impact of the parameter Qdist on grouping the smoothed approximation mask into regions of similar 



colors of the input image (Fig. 17a). Fig. 17b Fig. 17d show the approximate mask with small {&dist 
medium (6^^^^ = 25) and large {Odist^ 
in a random color. 



5), 

=40) values. For better visual distinction, each region is represented 
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Fig. 18: Impact of the color similarity distance Qdist 
on the spatial distortion error and the average run- 
time per image. 




Region Scoring Relevance 

Fig. 19: Impact of the region scoring parameter Qrei 
on the spatial distortion error and the average run- 
time per image. 



5.6 Region Scoring Relevance 

In the region scoring stage we delete each region r 
from the smoothed approximate mask lapp at the i-th 



iteration if MR^ < Orel- Fig. 19 shows the impact of 
&rei on the segmentation quality. Low values of Srei 
lead to a lower number of deleted regions, leaving 
the final mask of the 001 surrounded with a thin 
border in most cases. Thus the total amount of false 
positives is increased as well as the spatial distortion. 
The overall influence of Orel is rather low because it 
only affects the refinement step Region Scoring (see 
Sec.lS^). 
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6 Impact of DOF on Similarity 

Measuring the similarity between two images is an 
essential step for content-based image retrieval 
(CBIR). If the image database contains a significant 
amount of low DOF images, a DOF-based segmen- 
tation algorithm can improve the classification ac- 
curacy because the extraction of features can be re- 
stricted to the subset of pixels contained in the 001 
of the images. 

Given for example two images /i and I2 which are 
containing two semantically different objects Oi and 
O2 in their particularly focussed area in front of a 
comparatively similar background. In this case, the 
distance between image Ii and I2 should be signifi- 
cantly lower than between the extracted Objects Oi 
and O2, such that d (Oi, O2) ^ d{Ii^ I2). 



Fig. [20] shows two sample images that practically 
display the same scene with a different distance of 
the lens to the focal plane so that different OOIs are 
displayed sharply. As our main focus does not lie 
in the evaluation of best possible feature descriptors 
or distance measures, we use the well known color 
histograms for the classification of the data set. For 
each image we thus create a color histogram h with 12 
bins. Between the color histograms h (/i) and h {I2) 
we can now measure the Minkowski-form Distance 
6^2- For two histograms Q and T, dp is defined as in 
Eq. [6] which corresponds to the Euclidean distance in 
case of p = 2. 



dpiQ,T) =[ {Qi-Tif 



(6) 



. i=0 



The distance between the color histograms of the 
complete images d2 {h {Ii) ^ h {I2)) = 334 is con- 
siderably lower (3.8 times) than the distance be- 
tween the histograms of their extracted OOIs 
d2{h{Oi),h{02)) = 1277 (see Fig. [20c] and 
Fig. 20d). This leads to the assumption, that a CBIR 



system could profit from an automatic segmentation 
of OOIs in low DOF images to improve search quality. 

For a brief verification of this hypothesis, we cre- 
ated a database containing 114 diverse DOF images 
divided into 17 classes: bird, bee, cat, coke, deer. 



eagle, airplane, car, fox, apple, ladybird, lion, milk, 
redtulp, yellowtulp and sunflower. 

Let lij be the j-th image of the i-th class G^. We 
then define the inner-class distance of an image lij 
to be the distance of the image lij to all other images 
^i/c5 k ^ j of the same class Gi 



1 



\G, 



d{h{J),h{Iij)), 



where d is the distance, which is set to the Euclidean 
Distance ^2 in our case. For a given distance mea- 
sure the average inter-class distance distinter of an 
image lij is the average distance to all other images 



dist. 



inter \^ij 
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d{h{j),h{u^,)). 
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In case of a classification task, it is required, that the 
average inner-class distance is smaller than the inter- 
class distance of an image. To further improve the 
classification, it is thus desirable to increase the dif- 
ference between inter-class and inner-class distance. 

In the following experiment, we measured the 
inner- and inter-class distance for all classes with- 
out any segmentation. Afterwards we applied the 
proposed segmentation algorithm to the images, re- 
peated the experiment and computed the relative 
changes of the inner- and inter-class distances. 

In Fig. [21] we summarize the result of the experi- 
ment. It can be seen that the inner-class distance is 
decreased to an average of 67% of its original value, 
which is significantly smaller than the decrease of 
the inter-class distance which decreases to an aver- 
age of 81% of its original value. Thus, the difference 
between inter-class and inner-class distance was im- 
proved by an average of 14% throughout the data set 
with a maximum of 27% in the case of the fox class 
and a minimum of 3% in the case of the (glass of) 
milk class. These differences between the inner-class 
distances and the inter-class distances are illustrated 
in Fig. [22] So it can be said that an CBIR task can 
profit from the DOF segmentation if the data set con- 
tains enough low DOF images. 
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(a) 



(b) 



(c) 



Fig. 20: Comparing the results of the different algorithms, where the comparatively similar input images /i, 
I2 (Fig. |2Qa / 2Qb] ) have homogeneous defocused regions, small DOF and variant colors. The segmentation 



results Oi and O2 (Fig. 20c / 20d) show rather few similarity. 



-♦-inner-class -^ inter-class 
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Fig. 21: Impact of the DOF segmentation on the 
inner- and inter-class distance of a test dataset. 



30% 




Fig. 22: Amount of percent points that the inner- 
class distance was lowered more than the inter-class 
distance. 



In this paper a new robust algorithm for the segmen- 
tation of low DOF images is proposed which does 
not need to set any parameters by hand as all neces- 
sary parameters are determined fully automatically 
or preset. Experiments are conducted on diverse sets 
of real world low depth of field images from various 
categories and the algorithm is compared to three ref- 
erence algorithms. The experiments show that the al- 
gorithm is more robust than the reference algorithms 
on all tested images and that it performs well even 
if the DOF is growing larger so that the background 
begins to show considerable texture. We also demon- 
strated the positive impact of low DOF segmentation 
to image similarity in case of CBIR. In our future 
work, we plan to improve processing speed and accu- 
racy of the algorithm. Furthermore, we plan to apply 
the algorithm to movies and to apply an automatic 
detection, whether an image is low DOF or not. A 
Java WebStart demo of the algorithm can be tested 
onlin^ We also plan to publish the test data as far 
as image licensing allows to do so as well as the set 
including the reference masks for the ROIs. The im- 
plementation of the reference algorithms will be made 
available as ImageJ [Ij plugins for download also at 
the demo URL. We also plan to publish the proposed 
algorithm as an ImageJ plug-in in the near future. 



^http:/ / www. dbs.ifi.lmu.de /research/ IJCV- 
ImageSegmentation / 
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